{smcl}
{cmd:help qqvalue}{right: ({browse "http://www.stata-journal.com/article.html?article=st0209":SJ10-4: st0209})}
{hline}

{title:Title}

{p2colset 5 16 18 2}{...}
{p2col :{hi:qqvalue} {hline 2}}Generate frequentist q-values by inverting multiple-test procedures{p_end}
{p2colreset}{...}


{title:Syntax}
{p 8 15 2}
{cmd:qqvalue}
{varname} 
{ifin} 
[{cmd:,} {cmdab:me:thod}{cmd:(}{it:method}{cmd:)} 
{cmdab:be:stof}{cmd:(}{it:#}{cmd:)} 
{opth qv:alue(newvar)}
 {opth np:value(newvar)}
 {opth ra:nk(newvar)}
 {opth sv:alue(newvar)}
 {opth rv:alue(newvar)}
 {cmd:float} {cmd:fast}]

{pstd}
where {it:method} is one of

{pstd}
{cmd:bonferroni} | {cmd:sidak} | {cmd:holm} | {cmd:holland} | {cmd:hochberg} | {cmd:simes} | {cmd:yekutieli}

{pstd}
{cmd:by} {varlist}{cmd::} can be used with {cmd:qqvalue}; see {manlink D by}.
If {cmd:by} {varlist}{cmd::} is used, then all generated variables are
calculated using the specified multiple-test procedure within each by-group
defined by the variables in the {varlist}.


{title:Description}

{pstd}
{cmd:qqvalue} is similar to the {browse "http://www.r-project.org/":R} package
{cmd:p.adjust}.  It inputs a single variable, which is assumed to contain p-values
calculated for multiple comparisons, into a dataset with one observation per
comparison.  It outputs a new variable--calculated by inverting a
multiple-test procedure specified by the user--containing the frequentist
q-values corresponding to these p-values.  Each q-value represents,
for each corresponding p-value, the minimum uncorrected p-value threshold for
which that p-value would be in the discovery set, assuming that the specified
multiple-test procedure was used on the same set of input p-values to generate
a corrected p-value threshold.  These minimum uncorrected p-value thresholds
may represent familywise error rates or false discovery rates, depending on
the procedure used.  {cmd:qqvalue}'s options may be used to output other variables that
contain the various intermediate results used in calculating the
q-values.  The multiple-test procedures available for {cmd:qqvalue} are a
subset of those available using the {helpb multproc} command of the 
{helpb smileplot} package (Newson and the ALSPAC Study Team 2010).


{title:Options}

{phang}
{cmd:method(}{it:method}{cmd:)} specifies the multiple-test procedure
method to be used for calculating the q-values from the input p-values.  The
{it:method} may be {cmd:bonferroni}, {cmd:sidak}, {cmd:holm},
{cmd:holland}, {cmd:hochberg}, {cmd:simes}, or {cmd:yekutieli}.  These method
names specify that the q-values will be calculated from the input p-values by
inverting the multiple-test procedure specified by the {cmd:method()} option
of the same name for the {cmd:multproc} command of the {cmd:smileplot}
package (Newson and the ALSPAC Study Team 2010).  The default is
{cmd:method(bonferroni)}. 

{phang}
{cmd:bestof(}{it:#}{cmd:)} specifies an integer.  If the {cmd:bestof()}
option is specified and {it:#} is greater than the number of input p-values, then
the q-values are calculated assuming that the input p-values are a subset
(usually the smallest number of input p-values) of a superset of p-values.  If the {cmd:method()}
option specifies a one-step method (such as {cmd:bonferroni} or {cmd:sidak}),
then the q-values do not depend on the other p-values in the superset, but
only on the number of p-values in the superset.  If the {cmd:method()} option
specifies a step-down method (such as {cmd:holm} or {cmd:holland}), then it is
assumed that all the other p-values in the superset are greater than the
largest of the input p-values.  If the {cmd:method()} option specifies a
step-up method (such as {cmd:hochberg}, {cmd:simes}, or {cmd:yekutieli}), then
it is assumed that all the other p-values in the superset are equal to one,
which implies that the q-values will be conservative and will define an upper bound to
the respective q-values that would have been calculated if we knew the other
p-values in the superset.  If {cmd:bestof()} is unspecified (or nonpositive),
then the input p-values are assumed to be the full set of p-values calculated.
The {cmd:bestof()} option is useful if the input p-values are known (or
suspected) to be the smallest of a greater set of p-values that we do not
know.  This often happens if the input p-values are from a genome scan
reported in the literature.

{p 4 8 2} {opth qvalue(newvar)} specifies the name of a new output
variable containing the q-values calculated from the input
p-values. The new output variable is generated using the multiple-test procedure specified by the {cmd:method()}
option.

{p 4 8 2} {opth npvalue(newvar)} specifies the name of a new output
variable to be generated.  It contains in each observation the total number of
p-values in the sample of observations specified by the {helpb if} and 
{helpb in} qualifiers or in the by-group containing that observation if the
{helpb by:by:} prefix is specified.

{p 4 8 2}
{opth rank(newvar)} is the name of a new variable to be generated. It
contains in each observation the rank of the corresponding p-value from
the lowest to the highest.  Tied p-values are ranked according to their
position in the input dataset.  If the {cmd:by:} prefix is specified, then the
ranks are defined within the by-group.

{p 4 8 2}
{opth svalue(newvar)} specifies the name of a new output variable to be
generated, which contains the s-values calculated from the input p-values.  The
s-values are an intermediate result; they are calculated in the course of calculating
the q-values and are used mainly for validation.  They are calculated from
the input p-values by inverting the formulas used for the rank-specific
critical p-value thresholds that are calculated by the {cmd:multproc} command of the
{cmd:smileplot} package.  These rank-specific p-value thresholds are returned
in the generated variable specified by the {cmd:critical()} option of
{cmd:multproc}.  The s-values may be greater than one.

{p 4 8 2}
{opth rvalue(newvar)} specifies the name of a new output variable to be
generated, which contains the r-values calculated from the input p-values.
The r-values are an intermediate result; they are calculated in the course of
calculating the q-values and are used mainly for validation.  They are
calculated from the s-values by truncating the s-values to a maximum
of one.  The q-values are calculated from the r-values using a
procedure that is dependent on the multiple-test procedure specified by the
{cmd:method()} option.  If the multiple-test procedure is a one-step
procedure (such as {cmd:bonferroni} or {cmd:sidak}), then the q-values are
equal to the corresponding r-values.  If the multiple-test procedure is a
step-down procedure (such as {cmd:holm} or {cmd:holland}), then the
q-value for each p-value is equal to the cumulative maximum of all the
r-values corresponding to p-values of rank equal to or less than that
p-value.  If the multiple-test procedure is a step-up procedure (such as
{cmd:hochberg}, {cmd:simes}, or {cmd:yekutieli}), then the q-value for each
p-value is equal to the cumulative minimum of all the r-values
corresponding to p-values of rank equal to or greater than that p-value.

{p 4 8 2} {cmd:float} specifies that the output variables specified by the
{cmd:qvalue()}, {cmd:rvalue()} and {cmd:svalue()} options be created as
variables of {help data_types:type} {cmd:float}.  If {cmd:float} is absent,
then these variables are created as variables of type {cmd:double}.  Whether
or not {cmd:float} is specified, all generated variables are stored to the
lowest precision possible without loss of information.

{p 4 8 2}{cmd:fast} is an option for programmers.  It specifies that
{cmd:qqvalue} will not take any action to restore the original
data in the event of failure or if the user presses {cmd:Break}.


{title:Remarks}

{pstd}
Multiple test procedures are reviewed in Newson and the ALSPAC Study Team
(2003, 2010) and are described in the online help for {helpb multproc}.  Both of
these sources contain extensive references for further reading.

{pstd}
The {cmd:qqvalue} package is similar to the R package {cmd:p.adjust}, which
also calculates frequentist q-values corresponding to multiple-test
procedures.  In the online documentation for {cmd:p.adjust} in R,
the q-values are referred to as "adjusted p-values", although many users
refer to them as "q-values".  There is no clear consensus regarding the
correct terminology to use, even among statisticians.  The term q-value was
introduced in Storey (2003) to describe a minimum positive false discovery
rate (pFDR) under which a p-value will be included in a discovery set,
assuming that this discovery set is defined to control the pFDR.  The pFDR is
a quantity defined for empirical Bayesian methods.  By contrast, the
multiple-test procedures used by {cmd:qqvalue} and {cmd:p.adjust} define the
discovery set to control either the familywise error rate or the false
discovery rate, both of which are defined for purely frequentist
methods.  For this reason, I originally used the term quasi-q-values to
denote frequentist q-values, and I chose the name {cmd:qqvalue} for the package
to compute these.  However, I was later advised that the prefix quasi- was
not really necessary.  I therefore now simply use the term q-values, or
frequentist q-values if I need to distinguish them from Bayesian q-values.

{pstd}
{cmd:qqvalue}, {cmd:multproc}, and {cmd:smileplot} all require input
datasets with one observation for each of a set of p-values, usually
corresponding to a set of estimated parameters.  Such input datasets may be
produced using the official Stata commands {helpb statsby} and 
{helpb postfile} or, alternatively, by the user-written Stata package 
{helpb parmest}.


{title:Examples}

{pstd}
The following example uses {cmd:auto.dta}, which is distributed with Stata.  The 
{helpb somersd} package is used to measure the Somers' D parameters for rank
associations among a list of car-related variables and foreign origin.  The
{cmd:parmest} package then replaces the dataset in memory with a
new dataset that has one observation per estimated parameter and data on parameter
estimates, confidence limits, and p-values.  We then use {cmd:qqvalue} and the
Simes procedure to
calculate q-values corresponding to the p-values.

{phang2}{cmd:. sysuse auto}{p_end}
{phang2}{cmd:. somersd foreign price mpg headroom trunk weight length turn displacement gear_ratio, tdist}{p_end}
{phang2}{cmd:. parmest, norestore}{p_end}
{phang2}{cmd:. qqvalue p, method(simes) qvalue(myqval)}{p_end}
{phang2}{cmd:. list}{p_end}

{pstd}
The following example also uses {cmd:auto.dta}.  It first uses the 
{cmd:somersd} package with the {helpb parmby} command of the
{cmd:parmest} package to create a new dataset in the memory.  The dataset has one
observation for each of a list of rank correlations involving car price in
each car origin group (U.S. and foreign cars).  We then use {cmd:qqvalue} with
the {cmd:by} {varlist}{cmd::} prefix to demonstrate the calculation of two
separate sets of q-values, one for U.S.-made cars and one for foreign
cars.

{phang2}{cmd:. sysuse auto, clear}{p_end}
{phang2}{cmd:. parmby "somersd price mpg headroom trunk weight length turn displacement gear_ratio, tdist", by(foreign) norestore}{p_end}
{phang2}{cmd:. by foreign: qqvalue p, method(simes) qvalue(myqval)}{p_end}
{phang2}{cmd:. by foreign: list}{p_end}


{title:References}

{phang}
Newson, R., and the ALSPAC Study Team. 2003. {browse "http://www.stata-journal.com/article.html?article=st0035":Multiple-test procedures and smile plots.} {it:Stata Journal} 3: 109-132.

{phang}
Newson, R., and the ALSPAC Study Team. 2010. {browse "http://www.stata-journal.com/article.html?article=up0030":Software update: st0035_1: Multiple-test procedures and smile plots.} {it:Stata Journal} 10: 691-692.

{phang}
Storey, J. D. 2003. The positive false discovery rate: A Bayesian interpretation and the q-value. {it:Annals of Statistics} 31: 2013-2035.


{title:Author}

{pstd}Roger B. Newson{p_end}
{pstd}National Heart and Lung Institute{p_end}
{pstd}Imperial College London, UK{p_end}
{pstd}{browse "mailto:r.newson@imperial.ac.uk":r.newson@imperial.ac.uk}{p_end}


{title:Also see}

{psee}Article:  {it:Stata Journal}, volume 10, number 4: {browse "http://www.stata-journal.com/article.html?article=st0209":st0209}

{psee}
{space 1}Manual:  {manlink D by}, {manlink D statsby}, {manlink P postfile}

{psee}
{space 3}Help:  {manhelp by D}, {manhelp statsby D}, {manhelp postfile P}, {helpb multproc}, {helpb smileplot}, {helpb parmest}, {helpb somersd} (if installed)
{p_end}
